Annotating Chinese Collocations with Multi Information

نویسندگان

  • Ruifeng Xu
  • Qin Lu
  • Kam-Fai Wong
  • Wenjie Li
چکیده

This paper presents the design and construction of an annotated Chinese collocation bank as the resource to support systematic research on Chinese collocations. With the help of computational tools, the bi-gram and n-gram collocations corresponding to 3,643 headwords are manually identified. Furthermore, annotations for bi-gram collocations include dependency relation, chunking relation and classification of collocation types. Currently, the collocation bank annotated 23,581 bigram collocations and 2,752 n-gram collocations extracted from a 5-million-word corpus. Through statistical analysis on the collocation bank, some characteristics of Chinese bigram collocations are examined which is essential to collocation research, especially for Chinese.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotating Information Structures In Chinese Texts Using HowNet

This paper reported our work on annotating Chinese texts with information structures derived from HowNet. An information structure consists of two components: HowNet definitions and dependency relations. It is the unit of representation of the meaning of texts. This work is part of a multi-sentential approach to Chinese text understanding. An overview of HowNet and information structure are des...

متن کامل

The Identification and Classification of Unknown Words in Chinese An N-Grams-Based Approach

In this paper, we propose a new approach to identify unknown words in Chinese. This approach adopts an n-grams program to sort out the collocating word / character sequences which are possible words and phrases in Chinese. In addition to proposing the criteria for identifying Chinese new words, was also classify these new words according to their structural and semantic characteristics. The cor...

متن کامل

Automatic Extraction of English Collocations and their Chinese - English Bilingual Examples : A Computational Tool for Bilingual Lexicography

This paper describes the procedures involved in developing EXEC, a web-based system which can automatically extract English collocations and their Chinese-English bilingual examples from parallel corpora. The system draws on statistics, dependency parsing, and Chinese-English parallel corpora of more than 13 million English words and 27 million Chinese characters. By taking a word as well as th...

متن کامل

Supervised Learning Algorithms Evaluation on Recognizing Semantic Types of Spanish Verb-Noun Collocations

The meaning of such verb-noun collocations as the wind blows, time flies, the day passes by can be generalized as ‘what is designated by the noun exists’. Likewise, the meaning of make a decision, provide support, write a letter can be generalized as ‘make what is designated by the noun’. These generalizations represent the meaning of certain groups of collocations and may be used as semantic a...

متن کامل

Collocation and Trillocation

In this paper we proposed that the neglected three words collocations (trillocation) should be emphasized in collocation study. From the point of view of colligations, more useful collocations could be covered by adding a third category. For a specific third word, it will help avoid the unnaturalness of a two words collocation. A statistic based automatic trillocation extracting system is propo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007